Joint Clustering and Classification for Multiple Instance Learning
نویسندگان
چکیده
The Multiple Instance Learning (MIL) framework has been extensively used to solve weakly labeled visual classification problems, where each image or video is treated as a bag of instances. Instance Space based MIL algorithms construct a classifier by modifying standard classifiers by defining the probability that a bag is of the target class as the maximum over the probabilities that its instances are of the target class. Although they are the most commonly used MIL algorithms, they do not account for the possibility that the instances may have multiple intermediate concepts, and that these concepts may have unequal weighting in predicting the overall target class. On the other hand, Embedding-space (ES) based MIL approaches are able to tackle this issue by defining a set of concepts, and then embedding each bag into a concept space, followed by training a standard classifier in the embedding space. In previous ES based approaches, the concepts were discovered separately from the classifier, and thus were not optimized for the final classification task. Here we propose a novel algorithm to estimate concepts and classifier parameters by jointly optimizing a classification loss. This approach discovers a small set of discriminative concepts, which yield superior classification performance. The proposed algorithm is referred to as Joint Clustering Classification for MIL data (JC2MIL) because the discovered concepts induce clusters of data instances. In comparison to previous approaches JC2MIL obtains state-of-the-art results on several MIL datasetsCorel-2000, image annotation datasets (Elephant, Tiger and Fox), and UCSB Breast Cancer dataset.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملM3IC: Maximum Margin Multiple Instance Clustering
Clustering, classification, and regression, are three major research topics in machine learning. So far, much work has been conducted in solving multiple instance classification and multiple instance regression problems, where supervised training patterns are given as bags and each bag consists of some instances. But the research on unsupervised multiple instance clustering is still limited . T...
متن کاملMultiple Instance Learning with the Optimal Sub-Pattern Assignment Metric
Multiple instance data are sets or multi-sets of unordered elements. Using metrics or distances for sets, we propose an approach to several multiple instance learning tasks, such as clustering (unsupervised learning), classification (supervised learning), and novelty detection (semi-supervised learning). In particular, we introduce the Optimal Sub-Pattern Assignment metric to multiple instance ...
متن کاملIRDDS: Instance reduction based on Distance-based decision surface
In instance-based learning, a training set is given to a classifier for classifying new instances. In practice, not all information in the training set is useful for classifiers. Therefore, it is convenient to discard irrelevant instances from the training set. This process is known as instance reduction, which is an important task for classifiers since through this process the time for classif...
متن کاملDifferent Learning Levels in Multiple-choice and Essay Tests: Immediate and Delayed Retention
This study investigated the effects of different learning levels, including Remember an Instance (RI), Remember a Generality (RG), and Use a Generality (UG) in multiple-choice and essay tests on immediate and delayed retention. Three-hundred pre-intermediate students participated in the study. Reading passages with multiple-choice and essay questions in different levels of learning were giv...
متن کامل